HH-MOTiF: de novo detection of short linear motifs in proteins by Hidden Markov Model comparisons
نویسندگان
چکیده
Short linear motifs (SLiMs) in proteins are self-sufficient functional sequences that specify interaction sites for other molecules and thus mediate a multitude of functions. Computational, as well as experimental biological research would significantly benefit, if SLiMs in proteins could be correctly predicted de novo with high sensitivity. However, de novo SLiM prediction is a difficult computational task. When considering recall and precision, the performances of published methods indicate remaining challenges in SLiM discovery. We have developed HH-MOTiF, a web-based method for SLiM discovery in sets of mainly unrelated proteins. HH-MOTiF makes use of evolutionary information by creating Hidden Markov Models (HMMs) for each input sequence and its closely related orthologs. HMMs are compared against each other to retrieve short stretches of homology that represent potential SLiMs. These are transformed to hierarchical structures, which we refer to as motif trees, for further processing and evaluation. Our approach allows us to identify degenerate SLiMs, while still maintaining a reasonably high precision. When considering a balanced measure for recall and precision, HH-MOTiF performs better on test data compared to other SLiM discovery methods. HH-MOTiF is freely available as a web-server at http://hh-motif.biochem.mpg.de.
منابع مشابه
A modular Bayesian model for de novo motif detection
The complexity of the global organization and internal structure of motifs in higher eukaryotic organisms raises significant challenges for motif detection techniques. To achieve successful de novo motif detection it is necessary to model the complex dependencies within and among motifs and incorporate biological prior knowledge. In this paper, we present LOGOS, an integrated LOcal and GlObal m...
متن کاملLOGOS: a modular Bayesian model for de novo motif detection
The complexity of the global organization and internal structures of motifs in higher eukaryotic organisms raises significant challenges for motif detection techniques. To achieve successful de novo motif detection it is necessary to model the complex dependencies within and among motifs and incorporate biological prior knowledge. In this paper, we present LOGOS, an integrated LOcal and GlObal ...
متن کاملMotif Recognition in Genomes
A well known and mathematically well established approach to motif detection in DNA is that of Hidden Markov Models (HMM), where the model is trained using a set of training sequences. A major property of k-th order Markov models is their locality, i.e. the probability of a state depends only on the last k states. However, especially when 3-dimensional structure plays a role in the recognition ...
متن کاملAligning sequences with repetitive motifs
Pairwise sequence alignment is among the most intensively studied problems in computational biology. We present a method for alignment of two sequences containing repetitive motifs. This is motivated by biological studies of proteins with zinc finger domain, an important group of regulatory proteins. Due to their evolutionary history, sequences of these proteins contain a variable number of dif...
متن کاملELM: the status of the 2010 eukaryotic linear motif resource
Linear motifs are short segments of multidomain proteins that provide regulatory functions independently of protein tertiary structure. Much of intracellular signalling passes through protein modifications at linear motifs. Many thousands of linear motif instances, most notably phosphorylation sites, have now been reported. Although clearly very abundant, linear motifs are difficult to predict ...
متن کامل